Using Artificial Tokens to Control Languages for Multilingual Image Caption Generation

نویسندگان

Satoshi Tsutsui

David J. Crandall

چکیده

Recent work in computer vision has yielded impressive results in automatically describing images with natural language. Most of these systems generate captions in a single language, requiring multiple language-specific models to build a multilingual captioning system. We propose a very simple technique to build a single unified model across languages, using artificial tokens to control the language, making the captioning system more compact. We evaluate our approach on generating English and Japanese captions, and show that a typical neural captioning architecture is capable of learning a single model that can switch between two different languages.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generating Image Descriptions using Multilingual Data

In this paper we explore several neural network architectures for the WMT 2017 multimodal translation sub-task on multilingual image caption generation. The goal of the task is to generate image captions in German, using a training corpus of images with captions in both English and German. We explore several models which attempt to generate captions for both languages, ignoring the English outp...

متن کامل

Cross-Lingual Image Caption Generation

Automatically generating a natural language description of an image is a fundamental problem in artificial intelligence. This task involves both computer vision and natural language processing and is called “image caption generation.” Research on image caption generation has typically focused on taking in an image and generating a caption in English as existing image caption corpora are mostly ...

متن کامل

Multilingual Multimodal Language Processing Using Neural Networks

We live in an increasingly multilingual multimodal world where it is common to find multiple views of the same entity across modalities and languages. For example, news articles which get published in multiple languages are essentially different views of the same entity. Similarly, video, audio and multilingual subtitles are multiple views of the same movie clip. Given the proliferation of such...

متن کامل

Using Text Surrounding Method to Enhance Retrieval of Online Images by Google Search Engine

Purpose: the current research aimed to compare the effectiveness of various tags and codes for retrieving images from the Google. Design/methodology: selected images with different characteristics in a registered domain were carefully studied. The exception was that special conceptual features have been apportioned for each group of images separately. In this regard, each group image surr...

متن کامل

Keyword Generation for Biomedical Image Retrieval with Recurrent Neural Networks

This paper presents the modeling approaches performed by the FHDO Biomedical Computer Science Group (BCSG) for the caption prediction task at ImageCLEF 2017. The goal of the caption prediction task is to recreate original image captions by detecting the interplay of present visible elements. A large-scale collection of 164,614 biomedical images, represented as imageID caption pairs, extracted f...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

CoRR

دوره abs/1706.06275 شماره

صفحات -

تاریخ انتشار 2017

Using Artificial Tokens to Control Languages for Multilingual Image Caption Generation

نویسندگان

چکیده

منابع مشابه

Generating Image Descriptions using Multilingual Data

Cross-Lingual Image Caption Generation

Multilingual Multimodal Language Processing Using Neural Networks

Using Text Surrounding Method to Enhance Retrieval of Online Images by Google Search Engine

Keyword Generation for Biomedical Image Retrieval with Recurrent Neural Networks

عنوان ژورنال:

اشتراک گذاری